6  Further Topics

6.1 Partial Differential Equations

6.2 Random Number Generation

How do we generate random numbers on a computer?

The ability to simulate random numbers on a computer is essential for modelling and understanding systems where chance and variability play a central role. Robust and repeatable random number generation is thus central to statistics, finance, queuing systems, cryptography, and a host of algorithms and other applications.

But what exactly do we mean by randomness or random numbers? For our purposes, we will consider “random numbers” to mean sequences of numbers that mimic the statistical properties of realisations of random variables drawn with a given distribution.

6.2.1 Uniform Random Numbers

Recall that a continuous random variable is a “random experiment” that takes values in \(\mathbb{R}^n\) (or some other uncountably infinite set). A discrete random variable is one that takes values in a countably infinite or finite set, like a subset of \(\mathbb{Z}\), for example. A continuous random variable is typically described in terms of its probability density function (PDF), while a discrete random variable is most often described by its probability mass function (PMF).

Example 6.1
Let \(X\) be a uniformly distributed random variable on \([a,b]\). Then \(X\) has probability density function given by \[ f(x) = \begin{cases} \frac{1}{b-a} & \text{if } a \leq x \leq b \\ 0 & \text{otherwise}. \end{cases} \] If \(Y\) is a random variable that is uniformly distributed on the integers \(\{0,\dots,n-1\}\), then its probability mass function is given by \[ P(k) = \mathbb{P}[X = k] = \begin{cases} \frac{1}{n} & \text{if } k \in \{0, 1, \ldots, n-1\} \\ 0 & \text{otherwise}. \end{cases} \]

We will mainly be concerned with continuous random variables, and each continuous real-valued random variable \(X\) has an associated cumulative distribution function (CDF) (distribution function for short) that is defined by: \[ F(x) = \mathbb{P}[X \leq x]. \] Moreoever, for reasonably behaved real-valued random variables, the PDF and CDF can be related by the Fundamental Theorem of Calculus as follows: \[ f(x) = \frac{d}{dx}F(x), \quad F(x) = \int_{-\infty}^x f(y)\,dy. \]

If we know how to create a sample from a uniform distribution, then we can obtain a sample with a given distribution using the following result:

Theorem 6.1: The Inverse Transform Method
Let \(U\) be a uniformly distributed random variable on \([0,1]\), and let \(F(x)\) be a cumulative distribution function. If \(F\) is invertible, then \(X=F^{-1}(U)\) is distributed according to \(F\).

Therefore it is often enough to have samples from the uniform distribution and hence this distribution plays a central role in the theory of random number generation.

In some sense, there are no truly random number generators. Computers can only execute algorithms, which are deterministic instructions, and thus they can only yield samples which appear random. We call these numbers pseudorandom numbers, and the algorithms which produce these numbers are called pseudorandom number generators.

The theoretical wish

A generator of genuine random numbers is an algorithm that produces a sequence of random variables \(U_1\), \(U_2\), \(\ldots\) which satisfies:

  1. Each \(U_i\) is uniformly distributed between \(0\) and \(1\).
  2. The \(U_i\) are mutually independent.

Property (ii) is the more important one since the normalisation in (i) is convenient but not crucial. Property (ii) implies that all pairs of values are uncorrelated and, more generally, that the value of \(U_i\) should not be predictable from \(U_1,\ldots, U_{i-1}\). Of course, the properties listed above are those of authentically random numbers; the goal is to come as close as possible to these properties with our artificially generated pseudorandom numbers.

6.2.2 Linear Congruential Generators

An important and simple class of generators are the linear congruential generators, abbreviated as LCGs.
We need the modulo operation in order to define this class of generators.

Definition 6.1
For nonnegative integers \(x\) and \(m\), we call \(y=x \;\text{mod}\;m\) the integer remainder from the division \(x/m\); we will write this as \[ y = x \mod m. \]

Definition 6.2
A linear congruential generator (LCG) is an iteration of the form \[ \begin{aligned} x_{i+1} &= (a x_i + c) \mod m \\ u_{i+1} &= \frac{x_{i+1}}{m} \in (0,1), \end{aligned} \] where \(a\), \(c\), and \(m\) are integers.

Example 6.2
Choose \(a=6\), \(c=0\), and \(m=11\). Starting from \(x_0=1\), which is called the seed, gives \[ 1, 6, 3, 7, 9, 10, 5, 8, 4, 2, 1, 6, \ldots \] Choosing \(a=3\) yields the sequence \[ 1, 3, 9, 5, 4, 1, \ldots \] whereas the seed \(x_0 = 2\) results in \[ 2, 6, 7, 10, 8, 2, \ldots \]

Conditions for a Full Period I

Theorem 6.2
Suppose \(c\neq 0\). The generator has full period (that is, the number of distinct values generated from any seed \(x_0\) is \(m-1\)) if and only if the following conditions hold:

  1. \(c\) and \(m\) are relatively prime (their only common divisor is \(1\)).

  2. Every prime number that divides \(m\) divides \(a-1\) as well.

  3. \(a-1\) is divisible by \(4\) if \(m\) is.

If \(m\) is a power of \(2\), the generator has full period if \(c\) is odd and \(a=4n+1\) for some integer \(n\).

Example 6.3
The Borland C++ LCG has parameters \[ m=2^{32},\quad a=22695477=1+4\times 5673869,\quad c=1. \] Hence, by the corollary above, the LCG has full period.

Conditions for a Full Period II

If \(c=0\) and \(m\) is a prime, then full period is achieved from any \(x_0 \not = 0\) when

  1. \(a^{m-1} - 1\) is a multiple of \(m\),
  2. \(a^j - 1\) is not a multiple of \(m\) for \(j=1, \ldots , m-2\).

If \(a\) satisfies these two properties it is called a primitive root of \(m\). In this situation, the sequence \(\{ x_i \}_{i \geq 1}\) is of the form \[ x_0 , a x_0 , a^2 x_0 , a^3 x_0 , \ldots \quad \mod m, \] given that \(c=0\). The sequence returns to \(x_0\) for the first time for the smallest \(k\) which satisfies \(a^k x_0 \mod m = x_0\). This is the smallest \(k\) for which \(a^k \mod m = 1\), that is \(a^k - 1\) is a multiple of \(m\).

Hence, the definition of a prime root coincides with the requirement that the series does not return to \(x_0\) until \(a^{m-1} x_0\).

Example 6.4: Examples of LCG Parameters
Modulus \(m\) Multiplier \(a\) Reference
\(2^{31} - 1\) \(16807\) Lewis, Goodman, and Miller, Park and Miller
\((=2147483647)\) \(39373\) L’Ecuyer
\(742938285\) Fishman and Moore
\(950706376\) Fishman and Moore
\(1226874159\) Fishman and Moore
\(2147483399\) \(40692\) L’Ecuyer
\(2147483563\) \(40014\) L’Ecuyer

Exercise 6.1
Define an LCG with parameters \[ c=0,\quad m=2^3-1=7,\quad a=3. \] Does this LCG have full period?

Earlier versions of Microsoft Excel (prior to 2003) relied on a linear congruential generator (LCG) for its RAND() function, which produced predictable and statistically non-random outputs that undermined the reliability of numerical simulations, Monte Carlo methods, and statistical sampling. This flaw was widely recognized in the simulation community during the late 1990s and early 2000s, leading Microsoft to update Excel’s random number algorithm in the 2003 release.

In applications, the following considerations are typically the most important when considering the appropriateness of a random number generating scheme:

  • Reproducibility
  • Speed
  • Portability
  • Period Length (if any)
  • Randomness (i.e. robust statistical properties)

Linear congruential generators are no longer (and have not been for some time) used practically, although they are a useful illustration of pseudorandom number generation. The Mersenne Twister family of algorithms is among the most popular in practice, and modern implementations are appropriate for most applications. It is notable for its extremely long period (\(2^{19937} - 1\)), strong statistical properties, and fast performance (see Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator by Matsumoto and Nishimura, 1998).

6.2.3 Testing uniformity

Previously, we have only checked for a full period, to see whether a pseudorandom number generator is reasonable or not. In this section, we demonstrate more elaborate means to test the quality of a pseudorandom number generator.

Given a sample from a supposedly uniform distribution, one can use statistical tests to reject the hypothesis of uniformity. The samples provided by a computer are fake since they are totally deterministic, and therefore not random (as such they cannot be uniformly distributed). However, they are so “well chosen”, that they might appear random. Hence we require statistical tests of randomness in order to judge the quality of a candidate PRNG.

6.2.3.1 Chi-Squared Test

Definition 6.3
Let \(k\geq 1\), and let \(X_1,\dots,X_k\) be a sequence of i.i.d. standard normally distributed random variables. The distribution of the sum of squares \[ S=X_1^2+\dots+X_k^2 \] is called chi-square with \(k\) degrees of freedom.

The probability density function of \(\chi(k)\) is given by \[ f(x,k)=\frac{1}{2^{k/2}\Gamma(k/2)}x^{k/2-1}e^{-x/2}, \] where \(\Gamma\) denotes the gamma function, which is defined by \[ \Gamma(\xi)=\int_0^\infty x^{\xi-1}e^{-x}\,\mathrm{d}x. \] For integers \(n \in \mathbb{N}\), we can write the Gamma function in terms of the factorial function as follows: \[ \Gamma(n)=(n-1)!=(n-1)(n-2)\dots 2\cdot 1. \]

Let \[ X_1,X_2,\dots, X_n \] be a sample.

The chi-squared test for uniformity is constituted by the following: - The null hypothesis \(H_0\): The sample is from a uniform distribution against the alternative hypothesis \(H_a\) that it is not. - The test statistic \[ T=\frac{k}{n}\sum_{j=1}^k \left ( n_j-\frac{n}{k}\right)^2, \] where \(k\) is the number of equidistant partitions (“bins”, to be chosen) of the unit interval, given by \[ [0,1/k),\quad [1/k, 2/k),\dots, [(k-1)/k,1]. \] and \(n_j\) is the number of observations in the \(j\)th bin. - The confidence level \(\alpha\) (to be chosen).

The following theorem is stated without proof.

Theorem 6.3
As \(n\rightarrow\infty\), \(T\) converges, in distribution, to the chi-square distribution \(\chi_{k-1}\) with \(k-1\) degrees of freedom.

Example 6.5: Using the Chi-Square Test for Uniformity
Suppose you have a sample of pseudorandom numbers generated on the interval \([0,1]\) and you wish to test if these are consistent with being independent and uniformly distributed.

Let’s set up the test with the following parameters: - The number of bins: \(k = 10\) - The sample size: \(n = 1000\) - The observed counts in each bin: \(n_j\) for \(j = 1, \ldots, 10\)

Step 1: Compute the Test Statistic

Partition the interval \([0,1]\) into 10 bins: \([0, 0.1), [0.1, 0.2), \ldots, [0.9, 1]\).

Count how many values fall in each bin, giving counts \(n_1, n_2, \ldots, n_{10}\).

The theoretical expected count in each bin is \(n/k = 1000/10 = 100\).

The chi-square test statistic is \[ T = \frac{k}{n} \sum_{j=1}^k \left( n_j - \frac{n}{k} \right)^2 \]

Suppose the observed bin counts based on the pseudorandom sample are:

Bin Observed Count \(n_j\)
1 98
2 104
3 112
4 107
5 94
6 95
7 108
8 97
9 96
10 89

Compute the sum: \[ T = \frac{10}{1000} \sum_{j=1}^{10} (n_j - 100)^2 \] \[ T = 0.01 \left[ (98-100)^2 + (104-100)^2 + (112-100)^2 + \cdots + (89-100)^2 \right] \] \[ T = 0.01 [4 + 16 + 144 + 49 + 36 + 25 + 64 + 9 + 16 + 121] \] \[ T = 0.01 \times 484 = 4.84 \]

Step 2: Find the Critical Value

The null hypothesis is rejected if \(T\) exceeds the critical value from the chi-square distribution with \(k-1 = 9\) degrees of freedom at significance level \(\alpha\) (e.g., \(\alpha = 5%\)).

For \(\chi^2_{0.95,9}\), the critical value is approximately 16.919 (which can be computed using the Matlab function chi2inv).

Step 3: Compare \(T\) to the critical value:

  • If \(T < 16.919\), do not reject the null hypothesis (the sample is consistent with being uniform).
  • If \(T > 16.919\), reject the null hypothesis (the sample is not uniform).

In this example: \[ 4.84 < 16.919 \] So we do not reject the null hypothesis; the sample passes the uniformity test.

6.2.3.2 The Kolmogorov–Smirnov Test

Another simple test uses the empirical distribution function of the sample, which we define below:

Definition 6.4
If \(x=(x_1,\dots,x_n)\) is a sample, then the empirical distribution function of \(x\) is given by \[ F_n(x) = \frac{1}{n}\sum_{i=1}^n \mathbb{1}_{(-\infty,x)}(x_i), \quad x \in \mathbb{R}. \]

Example 6.6

Comparison of the empirical and true CDFs for a sample of normally distributed pseudorandom numbers. We observe increasingly good agreement for larger sample sizes, as expected if the PRNG has good statistical properties.

The Kolmogorov–Smirnov test is based on the following intuition: If the sample is uniformly distributed, the deviation of the empirical distribution function from the theoretical distribution function should be small, as shown in the figure above.

The Kolmogorov–Smirnov test for uniformity is constituted by the following: - The null hypothesis \(H_0\): The sample is from a given distribution \(F\) against the alternative hypothesis \(H_a\) that it is not. - The test statistic \[ D_n=\sup_{x\in\mathbb R}\vert F_n(x)-F(x)\vert, \] where \(n\) is the sample size. - The confidence level \(\alpha\) (to be chosen).

As \(n\rightarrow\infty\), \(\sqrt{n} D_n\) converges to \[ \sup_{t\in\mathbb R}\vert B(F(t))\vert, \] in distribution, where \(B\) is a Brownian Bridge (i.e., the quantity \(\sup_{t\in\mathbb R}\vert B(F(t))\vert\) is a random variable).

For \(F(t)=t\), the uniform distribution, the critical values of the Kolmogorov statistics \(D_n\) are known. For large \(n\), the statistics converge in distribution to the so-called Kolmogorov distribution, which satisfies \[ K=\sup_{t\in [0,1]}\vert B(t)\vert. \] In fact, it can be shown that \[ \mathbb P[K\leq x]=1-2\sum_{k=1}^\infty (-1)^{k-1}e^{-2k^2x^2}, \quad x \in \mathbb{R}^+. \]

Along with the statistical tests outlined above, popular test suites such as Diehard and Dieharder provide batteries of statistical tests for evaluating the quality of pseudorandom number generators (PRNGs). These suites assess a PRNG’s output for statistical randomness and are widely used in research and industry to ensure reliability and unpredictability.

6.3 Monte Carlo Integration

6.3.1 A motivating example

The basic idea of Monte Carlo Integration is that we can use random sampling to estimate the value of integrals. The method is named for the famous Monte Carlo casinos. The following example illustrates the basic idea, which can be generalised and applied to a wide variety of situations.

Example 6.7
Consider the unit circle in \(\mathbb{R}^2\), i.e. the set of points \[ \{(x,y) : x^2 + y^2 \leq 1 \}. \]

We know that the area of the unit circle is \(\pi\) and we could hence consider the following way to estimate \(\pi\) by random sampling. If we take two indepedent random numbers, \(X_i\) and \(Y_i\), each uniformly random on \([-1,1]\), then \((X_i,\,Y_i)\) is a uniform random number on the square \([-1,1] \times [-1,1]\). Since the distribution is uniform, the probability that a random pair \((X_i,\,Y_i)\) falls into the unit circle is \(\pi/4\).

Looking at the two numerical experiments below, we see that in the left panel, we have \(8\) of \(10\) points falling in unit circle. This gives us an estimate of \(\pi\) of about \(3.2\). For \(N = 1000\), we can’t count so easily by eye but the estimate in fact improves to about \(3.1523\). Put another way, we are estimating the integral \[ \int\int_{x^2 + y^2 \leq 1}^{} dx dy = \int_{-1}^1 \int_{-\sqrt{1-x^2}}^{\sqrt{1-x^2}} dx dy. \] This connection between probabilities, areas and integration is the basis of the Monte Carlow method, and we will spend the rest of this section formalising these connections.

6.3.2 The Monte Carlo Estimator

For a continuous real-valued random variable \(X\) with PDF \(f\), its mean or expectation can be computed via the formula:

\[ \mathbb{E}[X] = \int_\mathbb{R} x f(x)\,dx \]

Moreoever, if we want to calculate an expectation of a random variable \(Y = A(X)\), for some smooth function \(x \mapsto A(x)\), this is given by the related formula:

\[ \mu := \mathbb{E}[A(X)] = \int_{\mathbb{R}} A(x) f(x)\, dx. \]

However, often the function \(A\) is sufficiently complicated that the integral cannot be derived in closed form. For example, in financial applications, \(A\) might be the payoff function of an exotic option or be related to the claims policy on an insurance contract. The Monte-Carlo method (MCM) uses simulations to derive an approximation of \(\mu\) as follows:

Suppose we have a sample from the same distribution as \(X\), namely

\[ X_1,\ldots,X_n. \]

Then

\[ A(X_1), \ldots, A(X_n) \]

is a sample with the distribution of \(A(X)\). The strong law of large numbers says that the sample average converges almost surely to the expectation \(\mu\), i.e.,

\[ \lim_{n \to \infty} \frac{A(X_1) + \cdots + A(X_n)}{n} = \mathbb{E}[A(X)] = \mu, \quad \text{almost surely}. \]

The strong law applies under mild conditions (finite means and variances of the random variables), and allows us to use the sample average as an estimator for the true value of \(\mu\) for large values of \(n\).

Almost sure convergence is a mode of convergence appropriate to sequences of random variables. It is a very strong form of convergence (there are weaker forms) and roughly means that if you repeatedly observe values from a sequence of random variables, eventually the sequence will settle down and stay close to the limit for all future observations.

In practice, one runs a simulation and obtains a sample, say

\[ x_1, \ldots, x_n, \]

and thus a sample from the same distribution as \(A(X)\),

\[ A(x_1), \ldots, A(x_n). \]

For large \(n\), the SLLN therefore allows us to use the sample average, also called the Monte Carlo estimator in this context, as a proxy for \(\mu\), i.e. \[ \hat{\mu}_n := \frac{A(x_1) + \cdots + A(x_n)}{n} \approx \mu. \]

However, each time we run a new simulation the sample average delivers a new value. This begs the question:

6.3.3 How good is the approximation?

The Monte Carlo estimator has several desirable properties as a statistical estimator. In particular, the Monte Carlo estimator is:

  • Unbiased, i.e. \(\mathbb{E}[\hat{\mu}_n] = \mu\), or “the sample mean is equal to the true mean”.
  • Strongly consistent, i.e. \(\lim_{n\rightarrow\infty} \hat{\mu}_n = \mu\) almost surely.

Furthermore, we can quantify the error of the Monte Carlo estimator. If \(\sigma^2 = \text{Var}[A(X)] = \mathbb{E}[(A(X) - \mu)^2]\), the variance of \(A(X)\), was known, then we could use the Central Limit Theorem to quantify the quality of approximation of the Monte Carlo estimator. In fact, \[ \frac{\sqrt{n}}{\sigma}(\hat{\mu}_n-\mu) \approx \mathcal{N}(0,1), \quad \text{as }n \to\infty. \] In practice, we can estimate the variance of \(A(X)\) from random samples as part of the Monte Carlo Method.

Therefore, the \(\alpha\) confidence interval of \(\hat{\mu}_n\) is approximately equal to \[ \mu \pm z_{1-\alpha/2} \frac{\sigma}{\sqrt{n}}, \] where \(z_{\beta}\) is the \(\beta\)-quantile of \(\mathcal{N}(0,1)\) distribution, i.e. \[ \mathbb{P}[- z_{\beta} \leq Z \leq z_{\beta}] = \beta, \quad Z \sim \mathcal{N}(0,1). \]

6.3.4 The practical approach

Recall that for a sample \(y_1, \ldots, y_n\), the sample mean and variance are given by \[ \bar{y} = \frac{y_1 + \cdots + y_n}{n} \qquad s = \frac{\sum_{k=1}^{n}(y_k - \bar{y})^2}{n-1}. \]

Often, one does not know the precise values of \(\mu\) or \(\sigma^2\) (especially \(\mu\), otherwise we don’t need to run Monte-Carlo simulations at all!). One thus replaces \(\mu\) and \(\sigma^2\) by their empirical estimates as follows: \[ \hat{\mu}_n \pm z_{1-\alpha/2} \frac{\hat{\sigma}_n}{\sqrt{n}}, \] where \(\hat{\mu}_n, \hat{\sigma}_n^2\) are the empirical mean and variance of the given sample \[ A(x_1), \ldots, A(x_n). \]

The quantity \(\hat{\sigma}_n/\sqrt{n}\) is called standard error. Summarizing, we have \[ S.E. = \sqrt{\frac{\sum_{i=1}^n (A(x_i) - \hat{\mu}_n)^2}{n(n-1)}}. \]

The standard error expression tells us that the error scales like \(n^{-1/2}\). For example, if we wish to make our estimate 10 times more accurate we need to increase \(n\) by a factor of 100! This is a key feature of the Monte Carlo method and persists even in a higher dimensional setting. In one-dimensional the simple trapezoidal rule has an error which scales like \(n^{-2}\) and is hence far superior to the MCM. However, in \(d\)-dimensions the trapezoidal rule error scales like \(n^{-2/d}\) while the MCM retains its \(n^{-1/2}\) error scaling. Therefore MCMs are attractive methods for quickly and accurately evaluating higher dimensional integrals whose closed form expressions are not available.

Example 6.8
Let \(X\) be a uniformly distributed r.v. on the unit interval and consider \[ \mu = \mathbb{E}[X^2]. \]

For size \(n\), let us sample from the uniform distribution, which gives us \[ x_1, x_2, \ldots, x_n \] independent copies of \(X\). The Monte Carlo estimator for \(\mu\) is just the sample average \[ \mu \approx \frac{x_1^2 + \cdots + x_n^2}{n} =: \hat{\mu}_n. \]

The theoretical value of \(\mu\) can actually be calculated explicitly, and it is given by \[ \mu = \int_0^1 x^2 f(x)\,dx = \int_0^1 x^2 dx = \left.\frac{x^3}{3}\right|_0^1 = \frac{1}{3}, \] where we recall that the uniform distribution has density \(f \equiv 1\) on \([0,1]\).

The variance \(\sigma^2\) of the Monte Carlo estimator is the Variance of \(X^2\) divided by \(n\), \(\sigma^2/n\), where \[ \sigma^2 = \mathbb{E}\left[(X^2 - \mu)^2\right] = \mathbb{E}[X^4] - \mathbb{E}\left[(X^2)\right]^2 = 1/5 - (1/3)^2 = \frac{9-5}{45} = \frac{4}{45}. \]

We conclude that the \(\alpha\) confidence interval of \(\hat{\mu}_n\) is approximately \[ \mu \pm z_{1-\alpha/2} \frac{2}{\sqrt{9n}}. \]

6.3.5 Application: Stochastic Processes

Random variables are our standard mathematical model or formalisation of a random experiment. However, in many real-world scenarios we are observing a collection of observations from the same random experiment as time evolves. Hence, we need the concept of a stochastic process, which is simply a collection of random variables indexed by time. The simplest example is the random walk:

Example 6.9: Random walk

We take time to be discrete, i.e. the set of times is \(\{0,1,2,\dots\}\). A simple symmetric random walk can be written as \[ S_0 = 0, \quad S_n = Z_1 + Z_2 + \cdots + Z_n, \] where each \(Z_i\) is an independent random variable taking the value \(+1\) or \(-1\) with equal probability. Given that we know how to simulate random numbers on a comuputer, we can immediately simulate a random walk directly from the formula above.

Three simulated paths of a symmetric random walk.